Synthetic Monitoring

Simulate visitor interaction with your site to monitor the end user experience.

View Product Info

FEATURES

Simulate visitor interaction

Identify bottlenecks and speed up your website.

Learn More

Real User Monitoring

Enhance your site performance with data from actual site visitors

View Product Info

FEATURES

Real user insights in real time

Know how your site or web app is performing with real user insights

Learn More

Infrastructure Monitoring Powered by SolarWinds AppOptics

Instant visibility into servers, virtual hosts, and containerized environments

View Infrastructure Monitoring Info
Comprehensive set of turnkey infrastructure integrations

Including dozens of AWS and Azure services, container orchestrations like Docker and Kubernetes, and more 

Learn More

Application Performance Monitoring Powered by SolarWinds AppOptics

Comprehensive, full-stack visibility, and troubleshooting

View Application Performance Monitoring Info
Complete visibility into application issues

Pinpoint the root cause down to a poor-performing line of code

Learn More

Log Management and Analytics Powered by SolarWinds Loggly

Integrated, cost-effective, hosted, and scalable full-stack, multi-source log management

 View Log Management and Analytics Info
Collect, search, and analyze log data

Quickly jump into the relevant logs to accelerate troubleshooting

Learn More

Articles Home Cloudflare Outage (Nov 2025) Recap 

Cloudflare Outage November 2025 Recap 

01 Dec 2025 | Pingdom Team

What is Cloudflare? 

Cloudflare is a leading web infrastructure and security company, powering roughly 20% of global internet traffic through its content delivery network (CDN), security products, and performance optimization tools. Millions of websites, including major platforms like ChatGPT, Spotify, and X, rely on Cloudflare for uptime and fast, secure experiences.​ 

What Happened on November 18, 2025? 

On November 18, 2025, beginning at 11:20 UTC, Cloudflare experienced a critical outage due to a malfunction in its Bot Management system’s configuration process. The problem originated when an update triggered the creation of an oversized internal configuration file, which quickly overwhelmed the Cloudflare core proxy services. As a result, websites and apps protected by Cloudflare displayed HTTP 5xx error messages to users, which persisted until the issue was fully resolved at 17:06 UTC. The outage’s most severe impact lasted approximately three hours, followed by stabilization efforts over the subsequent hours.​ 

Scope of Outage 

  • The outage started at 11:20 UTC and cloud services were restored by 17:06 UTC the same day.​ 
  • Major services, including ChatGPT, X, Spotify, Canva, and thousands of smaller sites, were unreachable or experienced serious error rates.​ 
  • Both end-user applications and internal Cloudflare dashboards were impacted, with login failures, delayed account access, and configuration issues widely reported.​ 
  • Early warning signs included spikes in HTTP 5xx errors and declining network availability. The incident was first detected automatically at 11:31 UTC and was manually escalated almost instantly.​ 

Root Cause of the Outage 

Cloudflare’s own technical breakdown confirms the root cause was an internal database permissions issue. This triggered the Bot Management module to generate and distribute a “feature file” that rapidly expanded in size, leading the Cloudflare proxy software, responsible for routing critical traffic, to crash on thousands of edge machines. No evidence points to a cyberattack or external malicious action. Emergency fixes involved stopping the deployment of the bad file, distributing a last-known-good configuration, and hardening future update pathways. Cloudflare has publicly apologized and committed to additional resilience by improving configuration file validation, adding global safety switches, and reviewing error handling logic.​ 

Lessons for Web Teams: Building Resilience 

Major outages, such as Cloudflare’s, reveal operational gaps and stress-test your recovery plans. Reviewing your architecture and incident playbooks now helps minimize downtime and data loss if your core infrastructure is compromised. 

  • Automate monitoring for configuration and file integrity, especially for machine-generated content used in sensitive routing or security decisions. 
  • Maintain known-good versions of critical configs and support fast rollback processes. 
  • Regularly test and harden fault tolerance for interdependent services to prevent cascading failures. 
  • Proactively review failure modes across all modules to minimize single points of failure and latent bugs.​ 

How SolarWinds Pingdom Can Help Protect Your Website 

SolarWinds® Pingdom® offers proactive website availability and performance monitoring from multiple global regions, helping IT and web admins detect outages early and respond efficiently. It delivers real-time insights into user experience and provides fast troubleshooting tools to track incidents and bottlenecks as soon as issues arise. Teams that use Pingdom can implement continuous monitoring and alerting across distributed infrastructure, reducing downtime and improving incident response. 

Start monitoring for free